17 research outputs found

    Enhancing Low-resolution Face Recognition with Feature Similarity Knowledge Distillation

    Full text link
    In this study, we introduce a feature knowledge distillation framework to improve low-resolution (LR) face recognition performance using knowledge obtained from high-resolution (HR) images. The proposed framework transfers informative features from an HR-trained network to an LR-trained network by reducing the distance between them. A cosine similarity measure was employed as a distance metric to effectively align the HR and LR features. This approach differs from conventional knowledge distillation frameworks, which use the L_p distance metrics and offer the advantage of converging well when reducing the distance between features of different resolutions. Our framework achieved a 3% improvement over the previous state-of-the-art method on the AgeDB-30 benchmark without bells and whistles, while maintaining a strong performance on HR images. The effectiveness of cosine similarity as a distance metric was validated through statistical analysis, making our approach a promising solution for real-world applications in which LR images are frequently encountered. The code and pretrained models are publicly available on https://github.com/gist-ailab/feature-similarity-KD

    SleePyCo: Automatic Sleep Scoring with Feature Pyramid and Contrastive Learning

    Full text link
    Automatic sleep scoring is essential for the diagnosis and treatment of sleep disorders and enables longitudinal sleep tracking in home environments. Conventionally, learning-based automatic sleep scoring on single-channel electroencephalogram (EEG) is actively studied because obtaining multi-channel signals during sleep is difficult. However, learning representation from raw EEG signals is challenging owing to the following issues: 1) sleep-related EEG patterns occur on different temporal and frequency scales and 2) sleep stages share similar EEG patterns. To address these issues, we propose a deep learning framework named SleePyCo that incorporates 1) a feature pyramid and 2) supervised contrastive learning for automatic sleep scoring. For the feature pyramid, we propose a backbone network named SleePyCo-backbone to consider multiple feature sequences on different temporal and frequency scales. Supervised contrastive learning allows the network to extract class discriminative features by minimizing the distance between intra-class features and simultaneously maximizing that between inter-class features. Comparative analyses on four public datasets demonstrate that SleePyCo consistently outperforms existing frameworks based on single-channel EEG. Extensive ablation experiments show that SleePyCo exhibits enhanced overall performance, with significant improvements in discrimination between the N1 and rapid eye movement (REM) stages.Comment: 14 pages, 3 figures, 8 table

    Block Selection Method for Using Feature Norm in Out-of-distribution Detection

    Full text link
    Detecting out-of-distribution (OOD) inputs during the inference stage is crucial for deploying neural networks in the real world. Previous methods commonly relied on the output of a network derived from the highly activated feature map. In this study, we first revealed that a norm of the feature map obtained from the other block than the last block can be a better indicator of OOD detection. Motivated by this, we propose a simple framework consisting of FeatureNorm: a norm of the feature map and NormRatio: a ratio of FeatureNorm for ID and OOD to measure the OOD detection performance of each block. In particular, to select the block that provides the largest difference between FeatureNorm of ID and FeatureNorm of OOD, we create Jigsaw puzzle images as pseudo OOD from ID training samples and calculate NormRatio, and the block with the largest value is selected. After the suitable block is selected, OOD detection with the FeatureNorm outperforms other OOD detection methods by reducing FPR95 by up to 52.77% on CIFAR10 benchmark and by up to 48.53% on ImageNet benchmark. We demonstrate that our framework can generalize to various architectures and the importance of block selection, which can improve previous OOD detection methods as well.Comment: 11 pages including reference. 5 figures and 5 table

    INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation

    Full text link
    Efficient and accurate segmentation of unseen objects is crucial for robotic manipulation. However, it remains challenging due to over- or under-segmentation. Although existing refinement methods can enhance the segmentation quality, they fix only minor boundary errors or are not sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows for adding and deleting instances and sharpening boundaries. Leveraging an error-estimation-then-refinement scheme, the model first estimates the pixel-wise boundary explicit errors: true positive, true negative, false positive, and false negative pixels of the instance boundary in the initial segmentation. It then refines the initial segmentation using these error estimates as guidance. Experiments show that the proposed model significantly enhances segmentation, achieving state-of-the-art performance. Furthermore, with a fast runtime (less than 0.1 s), the model consistently improves performance across various initial segmentation methods, making it highly suitable for practical robotic applications.Comment: 8 pages, 5 figure

    Learning to Place Unseen Objects Stably using a Large-scale Simulation

    Full text link
    Object placement is a fundamental task for robots, yet it remains challenging for partially observed objects. Existing methods for object placement have limitations, such as the requirement for a complete 3D model of the object or the inability to handle complex shapes and novel objects that restrict the applicability of robots in the real world. Herein, we focus on addressing the Unseen Object Placement (UOP}=) problem. We tackled the UOP problem using two methods: (1) UOP-Sim, a large-scale dataset to accommodate various shapes and novel objects, and (2) UOP-Net, a point cloud segmentation-based approach that directly detects the most stable plane from partial point clouds. Our UOP approach enables robots to place objects stably, even when the object's shape and properties are not fully known, thus providing a promising solution for object placement in various environments. We verify our approach through simulation and real-world robot experiments, demonstrating state-of-the-art performance for placing single-view and partial objects. Robot demos, codes, and dataset are available at https://gistailab.github.io/uop/Comment: 8 pages (main

    Deep Learning Based Detection of Missing Tooth Regions for Dental Implant Planning in Panoramic Radiographic Images

    No full text
    Dental implantation is a surgical procedure in oral and maxillofacial surgery. Detecting missing tooth regions is essential for planning dental implant placement. This study proposes an automated method that detects regions of missing teeth in panoramic radiographic images. Tooth instance segmentation is required to accurately detect a missing tooth region in panoramic radiographic images containing obstacles, such as dental appliances or restoration. Therefore, we constructed a dataset that contains 455 panoramic radiographic images and annotations for tooth instance segmentation and missing tooth region detection. First, the segmentation model segments teeth into the panoramic radiographic image and generates teeth masks. Second, a detection model uses the teeth masks as input to predict regions of missing teeth. Finally, the detection model identifies the position and number of missing teeth in the panoramic radiographic image. We achieved 92.14% mean Average Precision (mAP) for tooth instance segmentation and 59.09% mAP for missing tooth regions detection. As a result, this method assists diagnosis by clinicians to detect missing teeth regions for implant placement

    Deep Learning Based Detection of Missing Tooth Regions for Dental Implant Planning in Panoramic Radiographic Images

    No full text
    Dental implantation is a surgical procedure in oral and maxillofacial surgery. Detecting missing tooth regions is essential for planning dental implant placement. This study proposes an automated method that detects regions of missing teeth in panoramic radiographic images. Tooth instance segmentation is required to accurately detect a missing tooth region in panoramic radiographic images containing obstacles, such as dental appliances or restoration. Therefore, we constructed a dataset that contains 455 panoramic radiographic images and annotations for tooth instance segmentation and missing tooth region detection. First, the segmentation model segments teeth into the panoramic radiographic image and generates teeth masks. Second, a detection model uses the teeth masks as input to predict regions of missing teeth. Finally, the detection model identifies the position and number of missing teeth in the panoramic radiographic image. We achieved 92.14% mean Average Precision (mAP) for tooth instance segmentation and 59.09% mAP for missing tooth regions detection. As a result, this method assists diagnosis by clinicians to detect missing teeth regions for implant placement

    BattleSound: A Game Sound Benchmark for the Sound-Specific Feedback Generation in a Battle Game

    No full text
    A haptic sensor coupled to a gamepad or headset is frequently used to enhance the sense of immersion for game players. However, providing haptic feedback for appropriate sound effects involves specialized audio engineering techniques to identify target sounds that vary according to the game. We propose a deep learning-based method for sound event detection (SED) to determine the optimal timing of haptic feedback in extremely noisy environments. To accomplish this, we introduce the BattleSound dataset, which contains a large volume of game sound recordings of game effects and other distracting sounds, including voice chats from a PlayerUnknown’s Battlegrounds (PUBG) game. Given the highly noisy and distracting nature of war-game environments, we set the annotation interval to 0.5 s, which is significantly shorter than the existing benchmarks for SED, to increase the likelihood that the annotated label contains sound from a single source. As a baseline, we adopt mobile-sized deep learning models to perform two tasks: weapon sound event detection (WSED) and voice chat activity detection (VCAD). The accuracy of the models trained on BattleSound was greater than 90% for both tasks; thus, BattleSound enables real-time game sound recognition in noisy environments via deep learning. In addition, we demonstrated that performance degraded significantly when the annotation interval was greater than 0.5 s, indicating that the BattleSound with short annotation intervals is advantageous for SED applications that demand real-time inferences
    corecore